BioReason Collection BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model • 3 items • Updated 1 day ago • 5
ConTEB training datasets Collection Training data for the InSeNT method. • 3 items • Updated 1 day ago • 1
ConTEB evaluation datasets Collection Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated 1 day ago • 1
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • 1 day ago • 13
Comma v0.1 Artifacts Collection A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 2 items • Updated 2 days ago • 2
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana • 8 days ago • 40
view article Article Tiny Agents in Python: a MCP-powered agent in ~70 lines of code By celinah and 3 others • 12 days ago • 117
view changelog Changelog Xet is now the default storage option for new users and organizations 11 days ago • 52
view article Article NVIDIA Cosmos Now Available On Hugging Face For Physical AI Reasoning By PranjaliJoshi and 1 other • 15 days ago • 24
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published 20 days ago • 31
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Paper • 2505.02370 • Published 29 days ago • 14
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 20 days ago • 109
view article Article Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • 20 days ago • 10
view article Article LeRobot Community Datasets: The “ImageNet” of Robotics — When and How? By danaaubakirova and 6 others • 24 days ago • 52
view article Article AI Personas: The Impact of Design Choices By giadap and 1 other • 27 days ago • 14
Hugging Face community’s Wikimedia datasets Collection Wikimedia datasets created by the Hugging Face community, not Wikimedia. Sorted by Wikimedia project. • 17 items • Updated Jun 7, 2024 • 11
SwallowMath Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated 27 days ago • 3
SwallowCode Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated 27 days ago • 3